神经网络的出现彻底改变了运动合成领域。然而,学会从给定的分布中无条件合成动作仍然是一项具有挑战性的任务,尤其是当动作高度多样化时。我们提出了Modi,这是一种无条件的生成模型,可以合成各种动作。我们的模型在完全无监督的环境中训练,从多样化,非结构化和未标记的运动数据集中进行了训练,并产生了一个行为良好,高度语义的潜在空间。我们的模型的设计遵循StyleGAN的多产架构,并将其两个关键技术组件调整为运动域:一组样式编码,这些样式编码注入了生成器层次结构的每个级别和映射功能,并形成了一个学习和形成一个分离的潜在空间。我们表明,尽管数据集中缺乏任何结构,但潜在空间可以在语义上聚集,并促进语义编辑和运动插值。此外,我们提出了一种将未见动作转向潜在空间的技术,并展示了基于潜在的运动编辑操作,否则这些动作无法通过天真地操纵明确的运动表示无法实现。我们的定性和定量实验表明,我们的框架达到了最新的合成质量,可以遵循高度多样化的运动数据集的分布。代码和训练有素的模型将在https://sigal-raab.github.io/modi上发布。
translated by 谷歌翻译
多个摄像机制造的视频录制的可用性越来越多,为姿势和运动重建方法中的减少和深度歧义提供了新的方法。然而,多视图算法强烈依赖于相机参数;特别地,相机之间的相对介绍。在不受控制的设置中,这种依赖变为一旦转移到动态捕获一次。我们介绍Flex(免费多视图重建),一个端到端的无参数多视图模型。 Flex是无意义的参数,即它不需要任何相机参数,都不是内在的也不是外在的。我们的关键思想是骨架部件和骨长之间的3D角度是不变的相机位置。因此,学习3D旋转和骨长而不是位置允许预测所有相机视图的公共值。我们的网络采用多个视频流,学习通过新型多视图融合层的融合深度特征,并重建单一一致的骨架,其具有时间上相干的关节旋转。我们展示了人类3.6M和KTH多视图足球II数据集的定量和定性结果,以及动态摄像头捕获的合成多人视频流。我们将模型与最先进的方法进行比较,这些方法没有参与参数,并在没有相机参数的情况下显示,我们在获得相机参数可用时获取可比结果的同时优于较大的余量。我们的项目页面上可以使用代码,培训的模型,视频示例和更多材料。
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
In recent years the applications of machine learning models have increased rapidly, due to the large amount of available data and technological progress.While some domains like web analysis can benefit from this with only minor restrictions, other fields like in medicine with patient data are strongerregulated. In particular \emph{data privacy} plays an important role as recently highlighted by the trustworthy AI initiative of the EU or general privacy regulations in legislation. Another major challenge is, that the required training \emph{data is} often \emph{distributed} in terms of features or samples and unavailable for classicalbatch learning approaches. In 2016 Google came up with a framework, called \emph{Federated Learning} to solve both of these problems. We provide a brief overview on existing Methods and Applications in the field of vertical and horizontal \emph{Federated Learning}, as well as \emph{Fderated Transfer Learning}.
translated by 谷歌翻译
Recent advances in pixel-level tasks (e.g., segmentation) illustrate the benefit of long-range interactions between aggregated region-based representations that can enhance local features. However, such pixel-to-region associations and the resulting representation, which often take the form of attention, cannot model the underlying semantic structure of the scene (e.g., individual objects and, by extension, their interactions). In this work, we take a step toward addressing this limitation. Specifically, we propose an architecture where we learn to project image features into latent region representations and perform global reasoning across them, using a transformer, to produce contextualized and scene-consistent representations that are then fused with original pixel-level features. Our design enables the latent regions to represent semantically meaningful concepts, by ensuring that activated regions are spatially disjoint and unions of such regions correspond to connected object segments. The resulting semantic global reasoning (SGR) is end-to-end trainable and can be combined with any semantic segmentation framework and backbone. Combining SGR with DeepLabV3 results in a semantic segmentation performance that is competitive to the state-of-the-art, while resulting in more semantically interpretable and diverse region representations, which we show can effectively transfer to detection and instance segmentation. Further, we propose a new metric that allows us to measure the semantics of representations at both the object class and instance level.
translated by 谷歌翻译
Neural architectures can be naturally viewed as computational graphs. Motivated by this perspective, we, in this paper, study neural architecture search (NAS) through the lens of learning random graph models. In contrast to existing NAS methods which largely focus on searching for a single best architecture, i.e, point estimation, we propose GraphPNAS a deep graph generative model that learns a distribution of well-performing architectures. Relying on graph neural networks (GNNs), our GraphPNAS can better capture topologies of good neural architectures and relations between operators therein. Moreover, our graph generator leads to a learnable probabilistic search method that is more flexible and efficient than the commonly used RNN generator and random search methods. Finally, we learn our generator via an efficient reinforcement learning formulation for NAS. To assess the effectiveness of our GraphPNAS, we conduct extensive experiments on three search spaces, including the challenging RandWire on TinyImageNet, ENAS on CIFAR10, and NAS-Bench-101/201. The complexity of RandWire is significantly larger than other search spaces in the literature. We show that our proposed graph generator consistently outperforms RNN-based one and achieves better or comparable performances than state-of-the-art NAS methods.
translated by 谷歌翻译
There has been a recent explosion of impressive generative models that can produce high quality images (or videos) conditioned on text descriptions. However, all such approaches rely on conditional sentences that contain unambiguous descriptions of scenes and main actors in them. Therefore employing such models for more complex task of story visualization, where naturally references and co-references exist, and one requires to reason about when to maintain consistency of actors and backgrounds across frames/scenes, and when not to, based on story progression, remains a challenge. In this work, we address the aforementioned challenges and propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context across the generated frames. Sentence-conditioned soft attention over the memories enables effective reference resolution and learns to maintain scene and actor consistency when needed. To validate the effectiveness of our approach, we extend the MUGEN dataset and introduce additional characters, backgrounds and referencing in multi-sentence storylines. Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.
translated by 谷歌翻译
ICECUBE是一种用于检测1 GEV和1 PEV之间大气和天体中微子的光学传感器的立方公斤阵列,该阵列已部署1.45 km至2.45 km的南极的冰盖表面以下1.45 km至2.45 km。来自ICE探测器的事件的分类和重建在ICeCube数据分析中起着核心作用。重建和分类事件是一个挑战,这是由于探测器的几何形状,不均匀的散射和冰中光的吸收,并且低于100 GEV的光,每个事件产生的信号光子数量相对较少。为了应对这一挑战,可以将ICECUBE事件表示为点云图形,并将图形神经网络(GNN)作为分类和重建方法。 GNN能够将中微子事件与宇宙射线背景区分开,对不同的中微子事件类型进行分类,并重建沉积的能量,方向和相互作用顶点。基于仿真,我们提供了1-100 GEV能量范围的比较与当前ICECUBE分析中使用的当前最新最大似然技术,包括已知系统不确定性的影响。对于中微子事件分类,与当前的IceCube方法相比,GNN以固定的假阳性速率(FPR)提高了信号效率的18%。另外,GNN在固定信号效率下将FPR的降低超过8(低于半百分比)。对于能源,方向和相互作用顶点的重建,与当前最大似然技术相比,分辨率平均提高了13%-20%。当在GPU上运行时,GNN能够以几乎是2.7 kHz的中位数ICECUBE触发速率的速率处理ICECUBE事件,这打开了在在线搜索瞬态事件中使用低能量中微子的可能性。
translated by 谷歌翻译
我们证明,就机器学习政策的参数而言,自然梯度下降承认了与自然选择进化一致的共轭动力描述。我们将这些共轭动力学表征为对连续时间复制器动力学的本地最佳拟合,并表明价格方程适用于策略架构和参数生成的希尔伯特空间的函数等效类别。我们认为,“共轭自然选择”直观地解释了自然梯度下降的经验有效性,同时为机器学习动力学开发了有用的分析方法。
translated by 谷歌翻译
场景图生成的任务需要在给定图像(或视频)中识别对象实体及其相应的交互谓词。由于组合较大的解决方案空间,现有的场景图生成方法假设关节分布的某些分解以使估计可行(例如,假设对象在有条件地与谓词预测无关)。但是,在所有情况下,这种固定的分解并不是理想的(例如,对于相互作用中需要的对象很小且本身不可辨别的图像)。在这项工作中,我们建议使用马尔可夫随机字段中传递消息,提出一个针对场景图生成的新颖框架,并在图像上引入动态调节。这是作为迭代改进过程实现的,其中每个修改都在上一个迭代中生成的图上进行条件。跨改进步骤的这种条件允许对实体和关系进行联合推理。该框架是通过基于小说和端到端的可训练变压器建筑实现的。此外,建议的框架可以改善现有的方法性能。通过有关视觉基因组和动作基因组基准数据集的广泛实验,我们在场景图生成上显示了改善的性能。
translated by 谷歌翻译